Nao Mimoto - Dept. of Statistics : The University of Akron
TS Class Web Page – R resource page
See Shumway p145 GNP modeling with ARIMA
D <- read.csv("https://nmimoto.github.io/datasets/dowj.csv", header=TRUE)
Dowj <- ts(D, start=1, freq=1) # turn D into TS object
plot(Dowj, type="o") # plot with both line and circle D <- read.csv("http://gozips.uakron.edu/~nmimoto/pages/datasets/lake.csv", header=TRUE)
Lake <- ts(D, start=1875,, freq=1) # turn it into TS object startng year 1875
plot(Lake, type="o")Global mean land-ocean temparature data from Shumway p.4
library(astsa) # do install.packages("astsa") if not installed yet
data(gtemp) # load data that is inside astsa package
plot(gtemp, type="o", ylab="Global Temp Deviations")How do you check if a time seirs is stationary?
Visual inspection for constant mean, constant variance
KPSS test (\(H_0\): Stationary)
ADF test (\(H_0\): Not Stationary)
PP test (\(H_0\): Not Stationary)
Fit AR(1), and see if \(\hat \phi_1\) is significantly different from 1.
Kwiatkowski-Phillips-Schmidt-Shin (1992) test for
Default choice in auto.arima() \[ \left\{ \begin{array}{ll} H_0: X_t \mbox{ is trend stationary}\\ H_A: X_t \mbox{ is not stationary} \\ \end{array} \right. \]
Large p-value means “Stationary”.
Decompose the series as \[ X_t \hspace{3mm} = \hspace{3mm} T_t + W_t + Y_t \] where \(T_t\) is deterministic trend, \(W_t\) is random walk, and \(Y_t\) is stationary error.
Lagrange multiplier test that the random walk has zero variance.
Augumented Dickey-Fuller test (Brockwell p. 194)
“Unit-root” test
The stationarity hypothesis \[ \left\{ \begin{array}{ll} H_0: Y_t \mbox{ is not stationary} \\ H_A: Y_t \mbox{ is stationary} \end{array} \right. \]
Is replaced by \[ \left\{ \begin{array}{ll} H_0: Y_t \mbox{ has unit root} \\ H_A: Y_t \mbox{ does not have unit root } \end{array} \right. \]
Small p-value rejects the null hypothesis of non-stationarity, and means “Stationary”.
Fit AR(1) to a time series, and test if \(\phi_1\) is significantly different from 1, by \[ \frac{ \hat \phi_1 -1 }{ SE(\hat \phi_1) } \sim N(0,1) \]
Phillips-Perron test for the null hypothesis that x has a unit root.
Same Hypothesis as ADF test
Small p-value means “Stationary”.
The tests are similar to ADF tests, but they incorporate an automatic correction to the DF procedure to all ow for autocorrelated residuals.
Calculation of the test statistics is complex.
The tests usually give the same conclusions as the ADF tests
P-value of (KPSS, ADF, PP) tests
* (LARGE, small, small) \hspace{3mm} $\to$ All three indicating stationarity.
* (small, LARGE, LARGE) \hspace{3mm} $\to$ All three indicating non-stationarity.
* (anything else) \hspace{3mm} $\to$ Conflicing conclusions, inconclusive.
#- same file as Randomness.tests()
source("https://nmimoto.github.io/477/TS_R-90.txt")
# load Stationarity.tests() from class web page. source("https://nmimoto.github.io/477/TS_R-90.txt")
# Load Stationarity.tests() from Course Webpage
#--- Test it on random sample ---
X <- rnorm(100)
plot(X, type="o")## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.706 0.01
D <- read.csv("https://nmimoto.github.io/datasets/Steel.csv")
D1 <- ts(D[,2], start=c(1956,1), freq=12)
D2 <- window(D1, start=c(1969, 12))
plot(D1, type="o")## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.408 0.01
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.049 0.01
D4 <- window(D2, end=c(1981, 12)) # Dec 1969 to Dec 1981
D5 <- window(D2, start=c(1984, 1)) # Feb 1984 to Nov 1993
Stationarity.tests(D4)## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.01 0.01
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.046 0.01
If we treat it as “stationary” series, we can try to fit ARMA(p,q) series.
library(forecast)
source('https://nmimoto.github.io/R/TS-00.txt')
D <- read.csv("https://nmimoto.github.io/datasets/lake.csv")
Lake <- ts(D, start=1875, freq=1)
plot(Lake, type="o")## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.24 0.025
# find best ARMA(p,q) by AICc
Fit1 <- auto.arima(Lake, d=0, stepwise=FALSE, approximation=FALSE)
Fit1 ## Series: Lake
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.7665 0.3393 9.1290
## s.e. 0.0773 0.1123 0.3861
##
## sigma^2 estimated as 0.4784: log likelihood=-101.09
## AIC=210.18 AICc=210.62 BIC=220.48
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.963 0.952 0.934 0.567 0.641 0.89 0.684
auto.arima() chooses AR(2) with min AICc.
AR(2) with constant mean was fit directly to data. With observation \(Y_t\),
\[ {\large Y_t \hspace{3mm} = \hspace{3mm} \mu + X_t } \\ \] \[ {\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + \phi_2 X_{t-2} + e_t } \\ \] \[ {\large e_t \sim WN(0, \sigma^2) } \]
## Series: Lake
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.7665 0.3393 9.1290
## s.e. 0.0773 0.1123 0.3861
##
## sigma^2 estimated as 0.4784: log likelihood=-101.09
## AIC=210.18 AICc=210.62 BIC=220.48
If we choose that the level is going down linearly, we can try to fit a line with ARMA errors.
##
## Call:
## lm(formula = Lake ~ time(Lake))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.50919 -0.74760 -0.01556 0.75966 2.53409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.278141 7.922614 6.977 4.02e-10 ***
## time(Lake) -0.024071 0.004119 -5.843 7.16e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.136 on 95 degrees of freedom
## Multiple R-squared: 0.2644, Adjusted R-squared: 0.2566
## F-statistic: 34.14 on 1 and 95 DF, p-value: 7.165e-08
# fit the residuals with ARMA(p,q)
Fit2 <- auto.arima(Reg2$residuals, d=0, stepwise=FALSE, approximation=FALSE)
Fit2## Series: Reg2$residuals
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.6671 0.3827
## s.e. 0.0937 0.1135
##
## sigma^2 estimated as 0.452: log likelihood=-98.72
## AIC=203.44 AICc=203.7 BIC=211.16
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.985 0.974 0.969 0.19 0.189 0.782 0.669
Linear trend was fit to \(Y_t\), then the residuals were fitted with ARMA(1,1).
We observe $ Y_t = a+b t + X_t $ \(X_t\) is ARMA
In other words,
\[{\large Y_t \hspace{3mm} = \hspace{3mm} a + bt + X_t }\]
\[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} }\]
\[{\large e_t \sim WN(0, \sigma^2) }\]
##
## Call:
## lm(formula = Lake ~ time(Lake))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.50919 -0.74760 -0.01556 0.75966 2.53409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.278141 7.922614 6.977 4.02e-10 ***
## time(Lake) -0.024071 0.004119 -5.843 7.16e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.136 on 95 degrees of freedom
## Multiple R-squared: 0.2644, Adjusted R-squared: 0.2566
## F-statistic: 34.14 on 1 and 95 DF, p-value: 7.165e-08
## Series: Reg2$residuals
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.6671 0.3827
## s.e. 0.0937 0.1135
##
## sigma^2 estimated as 0.452: log likelihood=-98.72
## AIC=203.44 AICc=203.7 BIC=211.16
Let’s look at the difference between the observations: \[ diff(Y) = Y_t - Y_{t-1} \\ \\ \]
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
# Fit the difference of Lake with ARMA
Fit3 <- auto.arima(diff.Y, d=0, stepwise=FALSE, approximation=FALSE)
Fit3## Series: diff.Y
## ARIMA(1,0,2) with zero mean
##
## Coefficients:
## ar1 ma1 ma2
## 0.6385 -0.5349 -0.3514
## s.e. 0.1345 0.1445 0.1055
##
## sigma^2 estimated as 0.4812: log likelihood=-99.88
## AIC=207.76 AICc=208.2 BIC=218.02
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.981 0.958 0.946 0.551 0.545 0.575 0.681
## Series: Lake
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.6385 -0.5349 -0.3514
## s.e. 0.1345 0.1445 0.1055
##
## sigma^2 estimated as 0.4812: log likelihood=-99.88
## AIC=207.76 AICc=208.2 BIC=218.02
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.979 0.956 0.942 0.524 0.495 0.577 0.677
Difference of \(Y_t\) was taken. The difference seems to be ARMA(1,2).
In other words, $ Y_t $ is ARIMA(1,1,2), meaning \[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} + \theta_2 e_{t-2} }\] \[{\large e_t \sim WN(0,\sigma^2) }\]
## Series: Lake
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.6385 -0.5349 -0.3514
## s.e. 0.1345 0.1445 0.1055
##
## sigma^2 estimated as 0.4812: log likelihood=-99.88
## AIC=207.76 AICc=208.2 BIC=218.02
Let’s look at the difference between the observations: \[ diff(Y) = Y_t - Y_{t-1} \\ \\ \]
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.176 0.175 0.132 0.172 0.117 0.384 0.737
## Series: Lake
## ARIMA(0,1,0)
##
## sigma^2 estimated as 0.5383: log likelihood=-106.49
## AIC=214.98 AICc=215.02 BIC=217.54
Difference of \(Y_t\) was taken. The difference seems to be WN.
In other words, $ Y_t $ is ARIMA(0,1,0), meaning \[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} e_t }\] \[{\large e_t \sim WN(0,\sigma^2) }\]
## Series: Lake
## ARIMA(0,1,0)
##
## sigma^2 estimated as 0.5383: log likelihood=-106.49
## AIC=214.98 AICc=215.02 BIC=217.54